high-dimensional feature
Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing
Liu, Tianchi, Truong, Duc-Tuan, Das, Rohan Kumar, Lee, Kong Aik, Li, Haizhou
Speech foundation models have significantly advanced various speech-related tasks by providing exceptional representation capabilities. However, their high-dimensional output features often create a mismatch with downstream task models, which typically require lower-dimensional inputs. A common solution is to apply a dimensionality reduction (DR) layer, but this approach increases parameter overhead, computational costs, and risks losing valuable information. To address these issues, we propose Nested Res2Net (Nes2Net), a lightweight back-end architecture designed to directly process high-dimensional features without DR layers. The nested structure enhances multi-scale feature extraction, improves feature interaction, and preserves high-dimensional information. We first validate Nes2Net on CtrSVDD, a singing voice deepfake detection dataset, and report a 22% performance improvement and an 87% back-end computational cost reduction over the state-of-the-art baseline. Additionally, extensive testing across four diverse datasets: ASVspoof 2021, ASVspoof 5, PartialSpoof, and In-the-Wild, covering fully spoofed speech, adversarial attacks, partial spoofing, and real-world scenarios, consistently highlights Nes2Net's superior robustness and generalization capabilities. The code package and pre-trained models are available at https://github.com/Liu-Tianchi/Nes2Net.
Support vector machines and linear regression coincide with very high-dimensional features
The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector. In this paper, we explore the generality of this phenomenon and make the following contributions. First, we prove a super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models, matching the upper bounds from previous works. We further identify a sharp phase transition in Gaussian feature models, bound the width of this transition, and give experimental support for its universality. Finally, we hypothesize that this phase transition occurs only in much higher-dimensional settings in the \ell_1 variant of the SVM, and we present a new geometric characterization of the problem that may elucidate this phenomenon for the general \ell_p case.
Feature Clock: High-Dimensional Effects in Two-Dimensional Plots
Ovcharenko, Olga, Sevastjanova, Rita, Boeva, Valentina
Humans struggle to perceive and interpret high-dimensional data. Therefore, high-dimensional data are often projected into two dimensions for visualization. Many applications benefit from complex nonlinear dimensionality reduction techniques, but the effects of individual high-dimensional features are hard to explain in the two-dimensional space. Most visualization solutions use multiple two-dimensional plots, each showing the effect of one high-dimensional feature in two dimensions; this approach creates a need for a visual inspection of k plots for a k-dimensional input space. Our solution, Feature Clock, provides a novel approach that eliminates the need to inspect these k plots to grasp the influence of original features on the data structure depicted in two dimensions. Feature Clock enhances the explainability and compactness of visualizations of embedded data and is available in an open-source Python library.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Software (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer
Marri, Samhita, Sivakumar, Arun N., Uppalapati, Naveen K., Chowdhary, Girish
--Tracking plant features is crucial for various agricultural tasks like phenotyping, pruning, or harvesting, but the unstructured, cluttered, and deformable nature of plant environments makes it a challenging task. In this context, the recent advancements in foundational models show promise in addressing this challenge. In our work, we propose PlantTrack where we utilize DINOv2 which provides high-dimensional features, and train a keypoint heatmap predictor network to identify the locations of semantic features such as fruits and leaves which are then used as prompts for point tracking across video frames using T APIR. We show that with as few as 20 synthetic images for training the keypoint predictor, we achieve zero-shot Sim2Real transfer, enabling effective tracking of plant features in real environments. I NTRODUCTION Our global population is projected to reach around 10 billion by the end of 2025 according to World Population Prospects 2019 highlights [1].
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.63)
ResSurv: Cancer Survival Analysis Prediction Model Based on Residual Networks
Survival prediction is an important branch of cancer prognosis analysis. The model that predicts survival risk through TCGA genomics data can discover genes related to cancer and provide diagnosis and treatment recommendations based on patient characteristics. We found that deep learning models based on Cox proportional hazards often suffer from overfitting when dealing with high-throughput data. Moreover, we found that as the number of network layers increases, the experimental results will not get better, and network degradation will occur. Based on this problem, we propose a new framework based on Deep Residual Learning. Combine the ideas of Cox proportional hazards and Residual. And name it ResSurv. First, ResSurv is a feed-forward deep learning network stacked by multiple basic ResNet Blocks. In each ResNet Block, we add a Normalization Layer to prevent gradient disappearance and gradient explosion. Secondly, for the loss function of the neural network, we inherited the Cox proportional hazards methods, applied the semi-parametric of the CPH model to the neural network, combined with the partial likelihood model, established the loss function, and performed backpropagation and gradient update. Finally, we compared ResSurv networks of different depths and found that we can effectively extract high-dimensional features. Ablation experiments and comparative experiments prove that our model has reached SOTA(state of the art) in the field of deep learning, and our network can effectively extract deep information.
- North America > United States > New York (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
Manifold-based Shapley for SAR Recognization Network Explanation
Hu, Xuran, Zhu, Mingzhe, Liu, Yuanjing, Feng, Zhenpeng, Stankovic, LJubisa
Explainable artificial intelligence (XAI) holds immense significance in enhancing the deep neural network's transparency and credibility, particularly in some risky and high-cost scenarios, like synthetic aperture radar (SAR). Shapley is a game-based explanation technique with robust mathematical foundations. However, Shapley assumes that model's features are independent, rendering Shapley explanation invalid for high dimensional models. This study introduces a manifold-based Shapley method by projecting high-dimensional features into low-dimensional manifold features and subsequently obtaining Fusion-Shap, which aims at (1) addressing the issue of erroneous explanations encountered by traditional Shap; (2) resolving the challenge of interpretability that traditional Shap faces in complex scenarios.
- Europe > Montenegro (0.05)
- Asia > China (0.05)
SpaceEditing: Integrating Human Knowledge into Deep Neural Networks via Interactive Latent Space Editing
Wei, Jiafu, Xia, Ding, Xie, Haoran, Chang, Chia-Ming, Li, Chuntao, Yang, Xi
We propose an interactive editing method that allows humans to help deep neural networks (DNNs) learn a latent space more consistent with human knowledge, thereby improving classification accuracy on indistinguishable ambiguous data. Firstly, we visualize high-dimensional data features through dimensionality reduction methods and design an interactive system \textit{SpaceEditing} to display the visualized data. \textit{SpaceEditing} provides a 2D workspace based on the idea of spatial layout. In this workspace, the user can move the projection data in it according to the system guidance. Then, \textit{SpaceEditing} will find the corresponding high-dimensional features according to the projection data moved by the user, and feed the high-dimensional features back to the network for retraining, therefore achieving the purpose of interactively modifying the high-dimensional latent space for the user. Secondly, to more rationally incorporate human knowledge into the training process of neural networks, we design a new loss function that enables the network to learn user-modified information. Finally, We demonstrate how \textit{SpaceEditing} meets user needs through three case studies while evaluating our proposed new method, and the results confirm the effectiveness of our method.
- Europe > Austria > Burgenland > Eisenstadt (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > China (0.04)
Learning to Answer Multilingual and Code-Mixed Questions
Question-answering (QA) that comes naturally to humans is a critical component in seamless human-computer interaction. It has emerged as one of the most convenient and natural methods to interact with the web and is especially desirable in voice-controlled environments. Despite being one of the oldest research areas, the current QA system faces the critical challenge of handling multilingual queries. To build an Artificial Intelligent (AI) agent that can serve multilingual end users, a QA system is required to be language versatile and tailored to suit the multilingual environment. Recent advances in QA models have enabled surpassing human performance primarily due to the availability of a sizable amount of high-quality datasets. However, the majority of such annotated datasets are expensive to create and are only confined to the English language, making it challenging to acknowledge progress in foreign languages. Therefore, to measure a similar improvement in the multilingual QA system, it is necessary to invest in high-quality multilingual evaluation benchmarks. In this dissertation, we focus on advancing QA techniques for handling end-user queries in multilingual environments. This dissertation consists of two parts. In the first part, we explore multilingualism and a new dimension of multilingualism referred to as code-mixing. Second, we propose a technique to solve the task of multi-hop question generation by exploiting multiple documents. Experiments show our models achieve state-of-the-art performance on answer extraction, ranking, and generation tasks on multiple domains of MQA, VQA, and language generation. The proposed techniques are generic and can be widely used in various domains and languages to advance QA systems.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.27)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
- Asia > India > Himachal Pradesh > Shimla (0.04)
- (51 more...)
- Summary/Review (1.00)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.92)
- Media > Film (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Transportation > Ground (0.92)
- (6 more...)
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Sun, Jianyuan, Liu, Xubo, Mei, Xinhao, Plumbley, Mark D., Kilic, Volkan, Wang, Wenwu
Automated audio captioning (AAC) aims to describe the content of an audio clip using simple sentences. Existing AAC methods are developed based on an encoder-decoder architecture that success is attributed to the use of a pre-trained CNN10 called PANNs as the encoder to learn rich audio representations. AAC is a highly challenging task due to its high-dimensional talent space involves audio of various scenarios. Existing methods only use the high-dimensional representation of the PANNs as the input of the decoder. However, the low-dimension representation may retain as much audio information as the high-dimensional representation may be neglected. In addition, although the high-dimensional approach may predict the audio captions by learning from existing audio captions, which lacks robustness and efficiency. To deal with these challenges, a fusion model which integrates low- and high-dimensional features AAC framework is proposed. In this paper, a new encoder-decoder framework is proposed called the Low- and High-Dimensional Feature Fusion (LHDFF) model for AAC. Moreover, in LHDFF, a new PANNs encoder is proposed called Residual PANNs (RPANNs) by fusing the low-dimensional feature from the intermediate convolution layer output and the high-dimensional feature from the final layer output of PANNs. To fully explore the information of the low- and high-dimensional fusion feature and high-dimensional feature respectively, we proposed dual transformer decoder structures to generate the captions in parallel. Especially, a probabilistic fusion approach is proposed that can ensure the overall performance of the system is improved by concentrating on the respective advantages of the two transformer decoders. Experimental results show that LHDFF achieves the best performance on the Clotho and AudioCaps datasets compared with other existing models
- Europe > United Kingdom > England > Surrey (0.04)
- Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
Consensual Aggregation on Random Projected High-dimensional Features for Regression
In this paper, we present a study of a kernel-based consensual aggregation on randomly projected high-dimensional features of predictions for regression. The aggregation scheme is composed of two steps: the high-dimensional features of predictions, given by a large number of regression estimators, are randomly projected into a smaller subspace using Johnson-Lindenstrauss Lemma in the first step, and a kernel-based consensual aggregation is implemented on the projected features in the second step. We theoretically show that the performance of the aggregation scheme is close to the performance of the aggregation implemented on the original high-dimensional features, with high probability. Moreover, we numerically illustrate that the aggregation scheme upholds its performance on very large and highly correlated features of predictions given by different types of machines. The aggregation scheme allows us to flexibly merge a large number of redundant machines, plainly constructed without model selection or cross-validation. The efficiency of the proposed method is illustrated through several experiments evaluated on different types of synthetic and real datasets.
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia > Tasmania (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Israel (0.04)